Show the code
pacman::p_load(sf, tidyverse, questionr, janitor, psych, ggplot2, gcookbook, tmap, ggpubr, egg, corrplot, gtsummary, regclass, caret, heatmaply, ggdendro, cluster, factoextra, spdep, ClustGeo, GGally, skimr, stringr)December 5, 2022
case study : Regionalisation by Multivariate Water Point Attributes with Non-spatially Constrained and Spatially Constrained Clustering Methods.
Regionalisation with Spatially Constrained clustering analysis requires similar observations to be grouped according to their statistical attributes and spatial location.
Total number of water points by status, i.e. functional, non-functional, and unknown;
Percentage of water points by :
status (functional, non-functional, and unknown);
deployed water technology (hand pump, mechanical pump, stand tap, etc.) ;
usage capacity (1000, 300, 250, 50);
rural or urban.
Alpha-3 Code : NGA
Population : 225 million (1st in Africa, 6th globally)
Local Government Areas (LGA) : 774
Water Point Observations : 95,008
Environmental Aspects :
Geography :
Southwest - “rugged” highland.
Southeast - hills and mountains, which form the Mambilla Plateau, the highest plateau in Nigeria.
Hydrology :
Vegetation Coverage :
Lost nearly 80% of primary forest by 2012.3
States with dense forests concentrated : Bayelsa, Cross River, Edo, Ekiti, Ondo, Osun, Rivers, and Taraba.
1 Wikipedia. Nigeria. https://en.wikipedia.org/wiki/Nigeria
2 Ogbonna, D.N., Ekweozor, I.K.E., Igwe, F.U. (2002). “Waste Management: A Tool for Environmental Protection in Nigeria”. Ambio: A Journal of the Human Environment. 31 (1): 55–57. doi:10.1639/0044-7447(2002)031[0055:wmatfe]2.0.co;2.
3 https://rainforests.mongabay.com/20nigeria.htm
Three (3) Projected Coordinate Systems of Nigeria, EPSG : 26391, 26392, and 26303.
derive the proportion of functional and non-functional water points at LGA level (i.e. ADM2) by appropriate tidyr and dplyr methods;
combine geospatial and aspatial data frames into a simple feature data frame.
delineate water points measures functional regions by using :
conventional hierarchical clustering.
spatially constrained clustering algorithms.
plot two (2) main types of maps below :
Thematic Mapping
Show the derived water-point measures by appropriate statistical graphics and choropleth mapping technique.
Analytical Mapping
Plot delineated functional regions using non-spatially constrained and spatially constrained clustering algorithms.
The following are the packages required for this exercise :
sf package :
st_set_crs( ) - 3.4.5.20
stars package :
readr package :
dplyr :
tidyr :
ggplot2 package :
egg package :
corrplot package :
janitor package :
Usage of the code chunk below :
p_load( ) - pacman - to load packages. This function will attempt to install the package from CRAN or pacman repository list if its found not installed.
Aspatial Data
The file size of the downloaded data is about 422 MB due to water points data from multiple countries.
Hence, to avoid any error in pushing files larger than 100 MB to Git, filtered Nigeria water points and removed unnecessary variables before uploading into the R environment.
Therewith, the CSV file size should be lesser than 100 MB.
Geospatial Data
4 Runfola, D. et al. (2020) geoBoundaries: A global database of political administrative boundaries. PLoS ONE 15(4): e0231866. https://doi.org/10.1371/journal.pone.0231866
Four (4) data frames to be created for different context, i.e.
wp_coord = coordinated related variables.
wp_cond = status and conditions related variables.
wp_adm = administrative and measurements related variables.
wp = master file that combine all three (3) data frames above.
Usage of the code chunk below :
read_csv( ) - readr - to import and save the comma separated value (CSV) file as a data frame, with title “wp_coord”.
rename( ) - dplyr - to remove “#” from the variables.
problems( ) - readr - to reveal any parsing errors when importing the CSV file.
Remarks :
Upload and create new data frames according to the context of the variables. Therewith, these data frames can be used as and when the requirements fit the context thereof.
Usage of the code chunk below :
write_rds( ) - readr - to save wp_coord data table into an RDS format.
note : reduce the file size with this function -> compress = “xz”.
read_rds( ) - readr - to read wp_coord RDS file into wp_coord.
Usage of the code chunk below :
skim( ) - skimr - to get a broad overview of wp_coord data frame.
| Name | wp_coord |
| Number of rows | 95008 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| New Georeferenced Column | 0 | 1 | 11 | 45 | 0 | 95008 | 0 |
| lat_lon_deg | 0 | 1 | 8 | 42 | 0 | 95008 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| row_id | 0 | 1 | 199975.48 | 189726.13 | 10732.00 | 52632.75 | 86952.50 | 323671.50 | 681838.00 | ▇▃▂▂▁ |
| lat_deg | 0 | 1 | 9.33 | 2.48 | 4.30 | 7.36 | 9.09 | 11.83 | 13.87 | ▃▇▅▅▆ |
| lon_deg | 0 | 1 | 7.50 | 2.25 | 2.71 | 5.52 | 7.89 | 9.08 | 14.22 | ▃▃▇▃▁ |
wp_cond <- read_csv("data/aspatial/WPdx_NGAv1.2.1.csv",
col_select = c(`row_id`,
`#water_source`,
`#water_source_clean`,
`#water_source_category`,
`#water_tech_clean`,
`#water_tech_category`,
`#status_clean`,
`#status`)) %>%
rename(water_source = `#water_source`,
water_source_clean = `#water_source_clean`,
water_source_category = `#water_source_category`,
water_tech_clean = `#water_tech_clean`,
water_tech_category = `#water_tech_category`,
status_clean = `#status_clean`,
status = `#status`)
problems(wp_cond)| Name | wp_cond |
| Number of rows | 95008 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 7 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| water_source | 0 | 1.00 | 3 | 32 | 0 | 16 | 0 |
| water_source_clean | 302 | 1.00 | 8 | 22 | 0 | 5 | 0 |
| water_source_category | 302 | 1.00 | 4 | 11 | 0 | 3 | 0 |
| water_tech_clean | 10055 | 0.89 | 8 | 26 | 0 | 11 | 0 |
| water_tech_category | 10055 | 0.89 | 8 | 15 | 0 | 4 | 0 |
| status_clean | 10656 | 0.89 | 9 | 32 | 0 | 8 | 0 |
| status | 10656 | 0.89 | 14 | 156 | 0 | 834 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| row_id | 0 | 1 | 199975.5 | 189726.1 | 10732 | 52632.75 | 86952.5 | 323671.5 | 681838 | ▇▃▂▂▁ |
wp_adm <- read_csv("data/aspatial/WPdx_NGAv1.2.1.csv",
col_select = c(`row_id`,
`#clean_adm1`,
`#clean_adm2`,
`water_point_population`,
`local_population_1km`,
`crucialness_score`,
`pressure_score`,
`usage_capacity`,
`staleness_score`,
`rehab_priority`,
`is_urban`)) %>%
rename(clean_adm1 = `#clean_adm1`,
clean_adm2 = `#clean_adm2`)
problems(wp_adm)| Name | wp_adm |
| Number of rows | 95008 |
| Number of columns | 11 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| logical | 1 |
| numeric | 8 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| clean_adm1 | 0 | 1 | 3 | 25 | 0 | 37 | 0 |
| clean_adm2 | 0 | 1 | 3 | 19 | 0 | 753 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| is_urban | 0 | 1 | 0.21 | FAL: 75444, TRU: 19564 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| row_id | 0 | 1.00 | 199975.48 | 189726.13 | 10732.00 | 52632.75 | 86952.50 | 323671.50 | 681838.00 | ▇▃▂▂▁ |
| water_point_population | 539 | 0.99 | 1246.32 | 4027.41 | 0.00 | 117.00 | 413.00 | 1169.00 | 384595.00 | ▇▁▁▁▁ |
| local_population_1km | 539 | 0.99 | 3723.15 | 7417.59 | 0.00 | 597.00 | 1756.00 | 4393.00 | 384595.00 | ▇▁▁▁▁ |
| crucialness_score | 6879 | 0.93 | 0.41 | 0.34 | 0.00 | 0.13 | 0.30 | 0.63 | 1.00 | ▇▅▃▁▅ |
| pressure_score | 6879 | 0.93 | 3.21 | 9.04 | 0.00 | 0.40 | 1.18 | 3.10 | 776.97 | ▇▁▁▁▁ |
| usage_capacity | 0 | 1.00 | 488.63 | 310.95 | 50.00 | 300.00 | 300.00 | 1000.00 | 1000.00 | ▁▇▁▁▃ |
| staleness_score | 0 | 1.00 | 44.94 | 6.29 | 23.13 | 41.49 | 42.87 | 44.34 | 99.00 | ▁▇▁▁▁ |
| rehab_priority | 53109 | 0.44 | 1545.45 | 5243.53 | 0.00 | 136.50 | 522.00 | 1527.00 | 384595.00 | ▇▁▁▁▁ |
Remarks :
“staleness_score” indicates the depreciation of the data’s relevance.
The observation updated within 1 year has a “staleness_score” of approximately ~ 89.13 or higher.
Water points data collection is usually done quarterly or annually.
Based on the code chunk below, only 11 out of 95008 water points are updated within 1 year (between June 2021 to Aug 2022), meaning the rest, which is almost entire observations (the latest updated date was in Apr 2020), are outdated.
These outdated observations will need to be updated for the local governments or entities to allocate resources effectively in managing or upgrading these water points.
The “New Georeferenced Column” in wp_rds contains spatial data in a WKT format.
Two (2) steps to convert the WKT data format into an sf data frame.
Usage of the code chunk below :
st_sf( ) - sf - to convert the tibble data frame into sf data frame with crs first set to WGS 84 (EPSG : 4326).
st_crs( ) - sf - to retrieve coordinate reference system from the object.
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
::: {.callout-warning appearance=“simple” icon=“false”} ::: {.callout-alert appearance=“simple” icon=“false”}
Usage of the code chunk below :
st_read( ) - sf - to read simple features.
select( ) - dplyr - to select “shapeName” variable. :::
Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
user-defined `sfl` provided. Falling back to `character`.
| Name | bdy_nga |
| Number of rows | 774 |
| Number of columns | 2 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| shapeName | 0 | 1 | 3 | 18 | 0 | 768 | 0 |
| geometry | 0 | 1 | 878 | 33370 | 0 | 774 | 0 |
Remarks :
There is no missing data.
There 774 unique “geometry” but only 768 unique “shapeName”
Usage of the code chunk below :
add_count( ) - dplyr - to count observations by group
filter( ) - dplyr - to retain shapeName that has count not equal to 1.
Usage of the code chunk below :
tmap_mode( ) - tmap - to set tmap mode to static plotting or interactive.
tm_shape( ) - tmap - to specify the shape object.
tm_polygons( ) - tmap - to fill the polygons and draw the polygon borders.
tm_view( ) - tmap - to set the options for the interactive tmap viewer.
tm_fill( ) - tmap - to specify either which colour to be used or which data variable mapped to the colour palette.
tm_borders( ) - tmap - to draw the polygon borders.
tmap_style( ) - tmap - to set the tmap style.
tm_layout( ) - tmap - to set the layout of cartographic map.
tmap mode set to interactive viewing
tm_shape(bdy_nga)+
tm_polygons()+
tm_view(set.zoom.limits = c(6,8))+
tm_shape(dupl_shapeName)+
tm_fill("shapeName",
n = 6,
style = "jenks")+
tm_borders(alpha = 0.5)+
tmap_style("albatross")+
tm_layout(main.title = "Distribution of Duplicated ShapeName",
main.title.size = 1.3,
main.title.position = "center")tmap style set to "albatross"
other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "beaver", "bw", "classic", "watercolor"
tmap mode set to plotting
Remarks :
The plot above indicates those duplicated water points are not within the same location.
The State info to be combined with the duplicated “shapeName”. This will make all the shapeName unique.
Simple feature collection with 12 features and 2 fields
Active geometry column: geometry
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 3.316459 ymin: 6.459038 xmax: 9.020704 ymax: 12.05035
Geodetic CRS: WGS 84
First 10 features:
shapeName bdy_nga$shapeName geometry
1 Bassa Bassa MULTIPOLYGON (((6.708541 7....
2 Bassa Bassa MULTIPOLYGON (((8.823522 10...
3 Ifelodun Ifelodun MULTIPOLYGON (((4.664107 8....
4 Ifelodun Ifelodun MULTIPOLYGON (((4.721977 7....
5 Irepodun Irepodun MULTIPOLYGON (((5.05493 8.0...
6 Irepodun Irepodun MULTIPOLYGON (((4.543349 7....
7 Nasarawa Nasarawa MULTIPOLYGON (((8.554589 11...
8 Nasarawa Nasarawa MULTIPOLYGON (((7.493228 8....
9 Obi Obi MULTIPOLYGON (((8.191919 6....
10 Obi Obi MULTIPOLYGON (((9.008576 8....
st_centroid(dupl_shapeName$geometry, of_largest_polygon = FALSE)
1 POINT (7.031827 7.791971)
2 POINT (8.782946 10.08015)
3 POINT (5.052235 8.544311)
4 POINT (4.636735 7.920948)
5 POINT (4.926215 8.169349)
6 POINT (4.498797 7.84861)
7 POINT (8.578262 12.00446)
8 POINT (7.760272 8.304034)
9 POINT (8.281026 7.022495)
10 POINT (8.734777 8.35534)
Rows: 12
Columns: 3
$ shapeName <chr> "Bassa", "Bassa", "Ifelodun", "Ifelodun", "Irepodu…
$ `bdy_nga$shapeName` <chr> "Bassa", "Bassa", "Ifelodun", "Ifelodun", "Irepodu…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((6.708541 7...., MULTI…
| lga | row_id | headquarter | state | iso3166code | dupl_shapeName_coord | lga_office_coord |
|---|---|---|---|---|---|---|
| Bassa | 94 | Oguma | Kogi | NG.KO.BA | 7.791971, 7.031827 | 7.80932, 6.74853 |
| Bassa | 95 | Bassa | Plateau | NG.PL.BA | 10.08015, 8.782946 | 10.11143, 8.71559 |
| Ifelodun | 304 | Share | Kwara | NG.KW.IF | 8.544311, 5.052235 | 8.5 5.0 |
| Ifelodun | 305 | Ikirun | Osun | NG.OS.ID | 7.920948, 4.636735 | 7.5 4.5 |
| Irepodun | 355 | Omu Aran | Kwara | NG.KW.IR | 8.169349, 4.926215 | 8.5 5.0 |
| Irepodun | 356 | Ilobu | Osun | NG.OS.IP | 7.84861, 4.498797 | 7.5 4.5 |
| Nasarawa | 519 | Bompai | Kano | NG.KN.NA | 12.00446, | |
| 8.578262 | 11.5 8.5 | |||||
| Nasarawa | 520 | Nasarawa | Nasarawa | NG.NA.NA | 8.304034, 7.760272 | 8.53477, 7.70153 |
| Obi | 546 | Obarike-Ito | Benue | NG.BE.OB | 7.022495, 8.281026 | 7.01129, 8.33118 |
| Obi | 547 | Obi | Nasarawa | NG.NA.OB | 8.35534, 8.734777 | 8.37944, 8.78561 |
| Surelere | 693 | Surelere | Lagos | NG.LA.SU | 6.493217, 3.346919 | 6.50048, 3.35488 |
| Surelere | 694 | Iresa-Adu | Oyo | NG.OY.SU | 8.088897, 4.393574 | 8.08459, 4.38538 |
Two (2) main steps involved :
5 Ong Z.R.J. (2022). Geospatial Analytics for Social Good - Understanding Nigeria Water functional and non-functional water point rate. https://jordan-isss624-geospatial.netlify.app/posts/geo/geospatial_exercise/#checking-of-duplicated-area-name
bdy_nga$shapeName[c(94,95,304,305,355,356,519,520,546,547,693,694)] <-
c("Bassa Kogi",
"Bassa Plateau",
"Ifelodun Kwara",
"Ifelodun Osun",
"Irepodun Kwara",
"Irepodun Osun",
"Nasarawa Kano",
"Nasarawa Nasarawa",
"Obi Nasarawa",
"Obi Benue",
"Surulere Lagos",
"Surulere Oyo")
bdy_nga$shapeName[c(94,95,304,305,355,356,519,520,546,547,693,694)] [1] "Bassa Kogi" "Bassa Plateau" "Ifelodun Kwara"
[4] "Ifelodun Osun" "Irepodun Kwara" "Irepodun Osun"
[7] "Nasarawa Kano" "Nasarawa Nasarawa" "Obi Nasarawa"
[10] "Obi Benue" "Surulere Lagos" "Surulere Oyo"
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] shapeName bdy_nga$shapeName geometry
<0 rows> (or 0-length row.names)
Combine both attribute and boundary of the water points into a simple feature object.
Usage of the code chunk below :
st_join( ) - sf - to join sf-class objects based on geometry, namely, wp_sf and bdy_nga.
Geometry set for 95008 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2.707441 ymin: 4.301812 xmax: 14.21828 ymax: 13.86568
Geodetic CRS: WGS 84
First 5 geometries:
POINT (10.47318 10.60104)
POINT (6.95009 6.78599)
POINT (7.615451 6.799595)
POINT (7.30539 6.30817)
POINT (10.44625 10.50681)
No duplicate combinations found of: shapeName, lat_lon_deg
Simple feature collection with 0 features and 24 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
# A tibble: 0 × 25
# … with 25 variables: shapeName <chr>, lat_lon_deg <chr>, dupe_count <int>,
# row_id <dbl>, lat_deg <dbl>, lon_deg <dbl>, New Georeferenced Column <chr>,
# water_source <chr>, water_source_clean <chr>, water_source_category <chr>,
# water_tech_clean <chr>, water_tech_category <chr>, status_clean <chr>,
# status <chr>, clean_adm1 <chr>, clean_adm2 <chr>,
# water_point_population <dbl>, local_population_1km <dbl>,
# crucialness_score <dbl>, pressure_score <dbl>, usage_capacity <dbl>, …
Remarks :
Each water point observation is unique as there are no duplicate combination of “shapeName” together with “lat_lon_deg”.
n % val%
FALSE 29856 31.4 31.4
TRUE 65123 68.5 68.6
NA 29 0.0 NA
Remarks :
There are 29,713 “FALSE”, which is more than 30% of LGA names mismatched between “shapeName” and “clean_adm2”.
Since the geoBoundaries data is collected from government-published and reliable internet sources.6
The 29 NA’s are 29 water points located beyond the LGA boundary, as shown in the plot below.
6 Daniel et. al (2020) geoBoundaries: A global database of political administrative boundaries. PlosOne. https://doi.org/10.1371/journal.pone.0231866
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
| Name | wp_joined1 |
| Number of rows | 94979 |
| Number of columns | 24 |
| _______________________ | |
| Column type frequency: | |
| character | 13 |
| logical | 1 |
| numeric | 10 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| New Georeferenced Column | 0 | 1.00 | 11 | 45 | 0 | 94979 | 0 |
| lat_lon_deg | 0 | 1.00 | 8 | 42 | 0 | 94979 | 0 |
| water_source | 0 | 1.00 | 3 | 32 | 0 | 16 | 0 |
| water_source_clean | 302 | 1.00 | 8 | 22 | 0 | 5 | 0 |
| water_source_category | 302 | 1.00 | 4 | 11 | 0 | 3 | 0 |
| water_tech_clean | 10054 | 0.89 | 8 | 26 | 0 | 11 | 0 |
| water_tech_category | 10054 | 0.89 | 8 | 15 | 0 | 4 | 0 |
| status_clean | 10648 | 0.89 | 9 | 32 | 0 | 8 | 0 |
| status | 10648 | 0.89 | 14 | 156 | 0 | 834 | 0 |
| clean_adm1 | 0 | 1.00 | 3 | 25 | 0 | 37 | 0 |
| clean_adm2 | 0 | 1.00 | 3 | 19 | 0 | 753 | 0 |
| geometry | 0 | 1.00 | 7 | 37 | 0 | 94979 | 0 |
| shapeName | 0 | 1.00 | 3 | 18 | 0 | 761 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| is_urban | 0 | 1 | 0.21 | FAL: 75423, TRU: 19556 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| row_id | 0 | 1.00 | 199938.21 | 189720.69 | 10732.00 | 52622.50 | 86939.00 | 323664.50 | 681838.00 | ▇▃▂▂▁ |
| lat_deg | 0 | 1.00 | 9.33 | 2.48 | 4.30 | 7.36 | 9.09 | 11.83 | 13.87 | ▃▇▅▅▆ |
| lon_deg | 0 | 1.00 | 7.50 | 2.25 | 2.71 | 5.52 | 7.89 | 9.08 | 14.22 | ▃▃▇▃▁ |
| water_point_population | 537 | 0.99 | 1246.24 | 4027.60 | 0.00 | 117.00 | 413.00 | 1169.00 | 384595.00 | ▇▁▁▁▁ |
| local_population_1km | 537 | 0.99 | 3723.38 | 7418.08 | 0.00 | 597.00 | 1756.00 | 4393.00 | 384595.00 | ▇▁▁▁▁ |
| crucialness_score | 6873 | 0.93 | 0.41 | 0.34 | 0.00 | 0.13 | 0.30 | 0.63 | 1.00 | ▇▅▃▁▅ |
| pressure_score | 6873 | 0.93 | 3.21 | 9.04 | 0.00 | 0.40 | 1.18 | 3.10 | 776.97 | ▇▁▁▁▁ |
| usage_capacity | 0 | 1.00 | 488.63 | 310.95 | 50.00 | 300.00 | 300.00 | 1000.00 | 1000.00 | ▁▇▁▁▃ |
| staleness_score | 0 | 1.00 | 44.94 | 6.29 | 23.13 | 41.49 | 42.87 | 44.34 | 99.00 | ▁▇▁▁▁ |
| rehab_priority | 53089 | 0.44 | 1545.55 | 5244.05 | 0.00 | 136.00 | 522.00 | 1527.00 | 384595.00 | ▇▁▁▁▁ |
mutate( ) - dplyr - to run replace_na( ) function.
wp_joined1 <- wp_joined1 %>%
mutate(status_clean = replace_na(status_clean, "Unknown")) %>%
mutate(status = replace_na(status, "Unknown")) %>%
mutate(water_tech_category = replace_na(water_tech_category, "Unknown")) %>%
mutate(water_source_category = replace_na(water_source_category, "Unknown")) %>%
mutate(water_point_population = replace_na(water_point_population, 0)) %>%
mutate(local_population_1km = replace_na(local_population_1km, 0)) %>%
mutate(crucialness_score = replace_na(crucialness_score, 0)) %>%
mutate(pressure_score = replace_na(pressure_score, 0))There are 9 unique values for “status_clean”. However, four (4) of them share the same context :
“Non functional due to dry season”
“Non-Functional due to dry season”
“Abandoned”
“Abandoned/Decommissioned”
Hence, the same context values need to combine into one unique value.
Save the updated values into wp_joined1 RDS file.
n % val%
Functional 45883 88.0 88.0
Functional but needs repair 4579 8.8 8.8
Functional but not in use 1686 3.2 3.2
[1] 52148
[1] 54.88801
Remarks :
The total functional water points is 52,148 which is about 54.89% of total water points.
Simple feature collection with 10 features and 3 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 2.711632 ymin: 4.302938 xmax: 13.5022 ymax: 13.86331
Geodetic CRS: WGS 84
# A tibble: 10 × 4
status_clean usage_capacity n geometry
* <chr> <dbl> <int> <MULTIPOINT [°]>
1 Functional 300 33687 ((3.064921 7.994882), (3.06…
2 Functional 1000 12124 ((3.080189 7.99252), (3.085…
3 Functional but needs repair 300 3306 ((3.340832 8.037962), (3.34…
4 Functional but needs repair 1000 1271 ((3.373801 7.992051), (3.33…
5 Functional but not in use 300 1071 ((3.046639 8.017765), (2.88…
6 Functional but not in use 1000 612 ((3.088655 8.005296), (3.05…
7 Functional 250 70 ((3.355785 6.498105), (3.67…
8 Functional but not in use 250 3 ((8.032945 6.878883), (7.00…
9 Functional 50 2 ((7.027967 4.765731), (8.92…
10 Functional but needs repair 250 2 ((6.465915 5.826699), (7.93…
n % val%
FALSE 47006 90.1 90.1
TRUE 5142 9.9 9.9
[1] 11252574
Remarks :
Given the “crucialness_score” is a ratio of current water point users to the total population within 1 km radius thereof :
Currently, 5,142 water points serve the population within a 1 km radius at its capacity limit.
The usage capacity may need to be increased to sustain or improve the growth or development rate within 1km of these water points.
Should the population within 1 km therefrom grow above 11,252,574, there may be multiple repercussions in resources management, urbanisation progress, local food and beverage consumption, local commodity prices, or worst case scenario would be the stability of civil society.
Mode FALSE TRUE
logical 27469 24679
[1] 100
Remarks :
Given the “pressure_score” is the ratio of the current water point users to the usage capacity thereof :
ggplot(data = wpt_functional,
aes(fct_infreq(status_clean), fill=status_clean))+
geom_bar()+
geom_text(
aes(label=after_stat(count)),
stat='count',
nudge_x=-0.25,
vjust=-0.2)+
geom_text(
aes(label= scales::percent(signif(after_stat(count/sum(count))))),
stat='count',
nudge_x=0.25,
vjust=-0.2)+
scale_x_discrete(guide = guide_axis(n.dodge = 2))+
guides(fill=guide_legend (title = 'Status'))
ggplot(data=wpt_functional,
aes(x=fct_infreq(
water_tech_category)))+
geom_bar(aes(
fill = water_tech_category),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(n.dodge = 2))+
guides(fill=guide_legend (title = 'Water Tech Deployed'))Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(count)` instead.

ggplot(data=wpt_functional,
aes(x=fct_infreq(
water_source_clean)))+
geom_bar(aes(
fill = water_source_clean),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Source of Water Supply'))
shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional
Min. : 0.00
1st Qu.: 32.61
Median : 47.41
Mean : 49.84
3rd Qu.: 66.99
Max. :100.00
-- reveal value :: “status_clean”
n % val%
Abandoned/Decommissioned 234 0.7 0.7
Non-Functional 29385 91.7 91.7
Non-Functional due to dry season 2410 7.5 7.5
[1] 32029
[1] 33.7119
Remarks :
There are 32,204, which is about 33.9% out of total water points.
Simple feature collection with 7 features and 3 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 2.707441 ymin: 4.301812 xmax: 13.4192 ymax: 13.86567
Geodetic CRS: WGS 84
# A tibble: 7 × 4
status_clean usage_capac…¹ n geometry
* <chr> <dbl> <int> <MULTIPOINT [°]>
1 Non-Functional 300 18492 ((3.064526 7.994448), (3…
2 Non-Functional 1000 10852 ((3.083391 7.993231), (3…
3 Non-Functional due to dry season 300 2012 ((3.051752 7.984243), (3…
4 Non-Functional due to dry season 1000 398 ((3.056661 7.985808), (3…
5 Abandoned/Decommissioned 1000 152 ((4.713438 7.891137), (4…
6 Abandoned/Decommissioned 300 82 ((3.199483 8.912549), (2…
7 Non-Functional 250 41 ((3.976195 6.582998), (3…
# … with abbreviated variable name ¹usage_capacity
[1] 93999535
[1] 46255888
Remarks :
Given the “crucialness_score” is a ratio of current water point users to the total population within a 1 km radius thereof , in the context of non-functional :
ggplot(data = wpt_nonFunctional,
aes(fct_infreq(status_clean),
fill=status_clean))+
geom_bar()+
geom_text(
aes(label=after_stat(count)),
stat='count',
nudge_x=-0.25,
vjust=-0.2)+
geom_text(
aes(label= scales::percent(
signif(
after_stat(
count/sum(count)
)))),
stat='count',
nudge_x=0.25,
vjust=-0.2)+
scale_x_discrete(
guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Status'))
ggplot(data=wpt_nonFunctional,
aes(fct_infreq(
water_tech_category)))+
geom_bar(aes(
fill = water_tech_category),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Water Tech Deployed'))
ggplot(data=wpt_nonFunctional,
aes(fct_infreq(
water_source_clean)))+
geom_bar(aes(
fill = water_source_clean),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Source of Water Supply'))
shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional
Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77
Median : 47.41 Median : 33.50 Median : 34.89
Mean : 49.84 Mean : 41.37 Mean : 35.58
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00
Max. :100.00 Max. :278.00 Max. :100.00
[1] 10656
[1] 11.2159
Remarks :
There are 10,656 water points with unknown status, about 11.22% of total water points.
shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional wp_unknown
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77 1st Qu.: 0.00
Median : 47.41 Median : 33.50 Median : 34.89 Median : 0.00
Mean : 49.84 Mean : 41.37 Mean : 35.58 Mean : 13.76
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00 3rd Qu.: 17.75
Max. :100.00 Max. :278.00 Max. :100.00 Max. :219.00
pct_unknown
Min. : 0.00
1st Qu.: 0.00
Median : 0.00
Mean : 12.55
3rd Qu.: 20.83
Max. :100.00
Usage of the code chunk below :
qtm( ) - tmap - to plot a thematic map quickly.
tmap_arrange( ) - tmap - to arrange small multiples in grid layout.
total_wp <- qtm(wp_nga,"total_wp")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
wp_functional <- qtm(wp_nga,"wp_functional")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
wp_nonFunctional <- qtm(wp_nga,"wp_nonFunctional")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
wp_unknown <- qtm(wp_nga,"wp_unknown")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
tmap_arrange(total_wp, wp_functional, wp_nonFunctional, wp_unknown, asp=0, ncol = 2, nrow = 2, widths = 5, heights = 10, sync = TRUE)
n % val%
Hand Pump 58755 61.8 61.8
Mechanized Pump 25644 27.0 27.0
Unknown 10055 10.6 10.6
Tapstand 553 0.6 0.6
Rope and Bucket 1 0.0 0.0
Remarks :
Only “Hand Pump”, “Mechanized Pump”, and “Tapstand” are to be extracted for further analysis as the rest are either less than 0.5% or “Unknown”.
wtc_hPump <- wp_joined1 %>%
filter(water_tech_category %in%
"Hand Pump")
wtc_mPump <- wp_joined1 %>%
filter(water_tech_category %in%
"Mechanized Pump")
wtc_tStand <- wp_joined1 %>%
filter(water_tech_category %in%
"Tapstand")
wp_nga <- wp_nga %>%
mutate(`total_handPump` = lengths(
st_intersects(bdy_nga, wtc_hPump)
)) %>%
mutate(`total_mechPump` = lengths(
st_intersects(bdy_nga, wtc_mPump)
)) %>%
mutate(`total_tapStand` = lengths(
st_intersects(bdy_nga, wtc_tStand)
)) %>%
mutate(`pct_handPump` = (`total_handPump`/`total_wp`*100)) %>%
mutate(`pct_mechPump` = (`total_mechPump`/`total_wp`*100)) %>%
mutate(`pct_tapStand` = (`total_tapStand`/`total_wp`*100)) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional wp_unknown
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77 1st Qu.: 0.00
Median : 47.41 Median : 33.50 Median : 34.89 Median : 0.00
Mean : 49.84 Mean : 41.37 Mean : 35.58 Mean : 13.76
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00 3rd Qu.: 17.75
Max. :100.00 Max. :278.00 Max. :100.00 Max. :219.00
pct_unknown total_handPump total_mechPump total_tapStand
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 0.00 1st Qu.: 6.00 1st Qu.: 11.00 1st Qu.: 0.0000
Median : 0.00 Median : 47.00 Median : 25.50 Median : 0.0000
Mean : 12.55 Mean : 75.89 Mean : 33.12 Mean : 0.7145
3rd Qu.: 20.83 3rd Qu.:111.00 3rd Qu.: 46.00 3rd Qu.: 0.0000
Max. :100.00 Max. :764.00 Max. :245.00 Max. :42.0000
pct_handPump pct_mechPump pct_tapStand
Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 16.70 1st Qu.: 12.20 1st Qu.: 0.0000
Median : 50.99 Median : 31.27 Median : 0.0000
Mean : 48.73 Mean : 37.54 Mean : 0.5794
3rd Qu.: 77.78 3rd Qu.: 57.71 3rd Qu.: 0.0000
Max. :100.00 Max. :100.00 Max. :32.8947
handPump <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_handPump",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
mechPump <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_mechPump",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
tapStand <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_tapStand",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
tmap_arrange(handPump, mechPump, tapStand,
asp=1,
ncol=2,
sync = TRUE)
n % val%
300 68789 72.4 72.4
1000 25644 27.0 27.0
250 573 0.6 0.6
50 2 0.0 0.0
Remarks :
Only “300”, “1000”, and “250” are to be extracted for further analysis as the rest are either less than 0.5% or “Unknown”.
But, “50” will be included in the new variable “total_ucN1000” as part of the none ‘1000’ “usage_capacity” value.
uCap_300 <- wp_joined1 %>%
filter(usage_capacity %in%
"300")
uCap_1000 <- wp_joined1 %>%
filter(usage_capacity %in%
"1000")
uCap_250 <- wp_joined1 %>%
filter(usage_capacity %in%
"250")
uCap_50 <- wp_joined1 %>%
filter(usage_capacity %in%
"50")
wp_nga <- wp_nga %>%
mutate(`total_uc300` = lengths(
st_intersects(bdy_nga, uCap_300)
)) %>%
mutate(`total_uc1000` = lengths(
st_intersects(bdy_nga, uCap_1000)
)) %>%
mutate(`total_uc250` = lengths(
st_intersects(bdy_nga, uCap_250)
)) %>%
mutate(`total_uc50` = lengths(
st_intersects(bdy_nga, uCap_50)
)) %>%
mutate(`total_ucN1000` = ((lengths(
st_intersects(
bdy_nga, uCap_300))) + (lengths(
st_intersects(
bdy_nga, uCap_250))) + (lengths(
st_intersects(
bdy_nga, uCap_50))))
)%>%
mutate(`pct_ucN1000` = (`total_ucN1000`/`total_wp`*100)) %>%
mutate(`pct_uc300` = (`total_uc300`/`total_wp`*100)) %>%
mutate(`pct_uc1000` = (`total_uc1000`/`total_wp`*100)) %>%
mutate(`pct_uc250` = (`total_uc250`/`total_wp`*100)) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional wp_unknown
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77 1st Qu.: 0.00
Median : 47.41 Median : 33.50 Median : 34.89 Median : 0.00
Mean : 49.84 Mean : 41.37 Mean : 35.58 Mean : 13.76
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00 3rd Qu.: 17.75
Max. :100.00 Max. :278.00 Max. :100.00 Max. :219.00
pct_unknown total_handPump total_mechPump total_tapStand
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 0.00 1st Qu.: 6.00 1st Qu.: 11.00 1st Qu.: 0.0000
Median : 0.00 Median : 47.00 Median : 25.50 Median : 0.0000
Mean : 12.55 Mean : 75.89 Mean : 33.12 Mean : 0.7145
3rd Qu.: 20.83 3rd Qu.:111.00 3rd Qu.: 46.00 3rd Qu.: 0.0000
Max. :100.00 Max. :764.00 Max. :245.00 Max. :42.0000
pct_handPump pct_mechPump pct_tapStand total_uc300
Min. : 0.00 Min. : 0.00 Min. : 0.0000 Min. : 0.00
1st Qu.: 16.70 1st Qu.: 12.20 1st Qu.: 0.0000 1st Qu.: 15.25
Median : 50.99 Median : 31.27 Median : 0.0000 Median : 59.00
Mean : 48.73 Mean : 37.54 Mean : 0.5794 Mean : 88.85
3rd Qu.: 77.78 3rd Qu.: 57.71 3rd Qu.: 0.0000 3rd Qu.:126.75
Max. :100.00 Max. :100.00 Max. :32.8947 Max. :767.00
total_uc1000 total_uc250 total_uc50 total_ucN1000
Min. : 0.00 Min. : 0.0000 Min. :0.000000 Min. : 0.00
1st Qu.: 11.00 1st Qu.: 0.0000 1st Qu.:0.000000 1st Qu.: 16.00
Median : 25.50 Median : 0.0000 Median :0.000000 Median : 60.00
Mean : 33.12 Mean : 0.7403 Mean :0.002584 Mean : 89.59
3rd Qu.: 46.00 3rd Qu.: 0.0000 3rd Qu.:0.000000 3rd Qu.:127.75
Max. :245.00 Max. :42.0000 Max. :1.000000 Max. :767.00
pct_ucN1000 pct_uc300 pct_uc1000 pct_uc250
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 39.68 1st Qu.: 38.67 1st Qu.: 12.20 1st Qu.: 0.0000
Median : 67.03 Median : 65.91 Median : 31.27 Median : 0.0000
Mean : 60.78 Mean : 60.17 Mean : 37.54 Mean : 0.6114
3rd Qu.: 87.35 3rd Qu.: 87.02 3rd Qu.: 57.71 3rd Qu.: 0.0000
Max. :100.00 Max. :100.00 Max. :100.00 Max. :32.8947
uc300 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_uc300",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
uc1000 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_uc1000",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
uc250 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_uc250",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
tmap_arrange(uc300, uc1000, uc250,
asp=1,
ncol=2,
sync = TRUE)
urban_1 <- wp_joined1 %>%
filter(is_urban %in%
"TRUE")
urban_0 <- wp_joined1 %>%
filter(is_urban %in%
"FALSE")
wp_nga <- wp_nga %>%
mutate(`total_urban1` = lengths(
st_intersects(bdy_nga, urban_1)
)) %>%
mutate(`total_urban0` = lengths(
st_intersects(bdy_nga, urban_0)
)) %>%
mutate(`pct_urban1` = (`total_urban1`/`total_wp`*100)) %>%
mutate(`pct_urban0` = (`total_urban0`/`total_wp`*100)) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional wp_unknown
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77 1st Qu.: 0.00
Median : 47.41 Median : 33.50 Median : 34.89 Median : 0.00
Mean : 49.84 Mean : 41.37 Mean : 35.58 Mean : 13.76
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00 3rd Qu.: 17.75
Max. :100.00 Max. :278.00 Max. :100.00 Max. :219.00
pct_unknown total_handPump total_mechPump total_tapStand
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 0.00 1st Qu.: 6.00 1st Qu.: 11.00 1st Qu.: 0.0000
Median : 0.00 Median : 47.00 Median : 25.50 Median : 0.0000
Mean : 12.55 Mean : 75.89 Mean : 33.12 Mean : 0.7145
3rd Qu.: 20.83 3rd Qu.:111.00 3rd Qu.: 46.00 3rd Qu.: 0.0000
Max. :100.00 Max. :764.00 Max. :245.00 Max. :42.0000
pct_handPump pct_mechPump pct_tapStand total_uc300
Min. : 0.00 Min. : 0.00 Min. : 0.0000 Min. : 0.00
1st Qu.: 16.70 1st Qu.: 12.20 1st Qu.: 0.0000 1st Qu.: 15.25
Median : 50.99 Median : 31.27 Median : 0.0000 Median : 59.00
Mean : 48.73 Mean : 37.54 Mean : 0.5794 Mean : 88.85
3rd Qu.: 77.78 3rd Qu.: 57.71 3rd Qu.: 0.0000 3rd Qu.:126.75
Max. :100.00 Max. :100.00 Max. :32.8947 Max. :767.00
total_uc1000 total_uc250 total_uc50 total_ucN1000
Min. : 0.00 Min. : 0.0000 Min. :0.000000 Min. : 0.00
1st Qu.: 11.00 1st Qu.: 0.0000 1st Qu.:0.000000 1st Qu.: 16.00
Median : 25.50 Median : 0.0000 Median :0.000000 Median : 60.00
Mean : 33.12 Mean : 0.7403 Mean :0.002584 Mean : 89.59
3rd Qu.: 46.00 3rd Qu.: 0.0000 3rd Qu.:0.000000 3rd Qu.:127.75
Max. :245.00 Max. :42.0000 Max. :1.000000 Max. :767.00
pct_ucN1000 pct_uc300 pct_uc1000 pct_uc250
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 39.68 1st Qu.: 38.67 1st Qu.: 12.20 1st Qu.: 0.0000
Median : 67.03 Median : 65.91 Median : 31.27 Median : 0.0000
Mean : 60.78 Mean : 60.17 Mean : 37.54 Mean : 0.6114
3rd Qu.: 87.35 3rd Qu.: 87.02 3rd Qu.: 57.71 3rd Qu.: 0.0000
Max. :100.00 Max. :100.00 Max. :100.00 Max. :32.8947
total_urban1 total_urban0 pct_urban1 pct_urban0
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 0.00 1st Qu.: 23.00 1st Qu.: 0.00 1st Qu.: 57.27
Median : 9.00 Median : 64.00 Median : 11.95 Median : 86.45
Mean : 25.27 Mean : 97.45 Mean : 25.61 Mean : 72.71
3rd Qu.: 33.00 3rd Qu.:141.00 3rd Qu.: 38.44 3rd Qu.:100.00
Max. :324.00 Max. :894.00 Max. :100.00 Max. :100.00
urban1 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_urban1",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
urban0 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_urban0",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
tmap_arrange(urban1, urban0,
asp=1,
ncol=2,
sync = TRUE)
Usage of the code chunk below :
st_crs( ) - sf - to inspect the coordinate reference system.
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Remarks :
The EPSG for wp_nga is 4326, which is WGS 84. To compute the proximity distance matrix for clustering analysis, this coordinate reference system needs to transform into EPSG: 26391.
Usage of the code chunk below :
st_set_crs( ) - sf - to update the coordinate reference system.
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
Coordinate Reference System:
User input: EPSG:26391
wkt:
PROJCRS["Minna / Nigeria West Belt",
BASEGEOGCRS["Minna",
DATUM["Minna",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4263]],
CONVERSION["Nigeria West Belt",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",4,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",4.5,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.99975,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",230738.26,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Nigeria - onshore west of 6°30'E, onshore and offshore shelf."],
BBOX[3.57,2.69,13.9,6.5]],
ID["EPSG",26391]]
-- review CRS :: bdy_ngaTrans
Coordinate Reference System:
User input: EPSG:26391
wkt:
PROJCRS["Minna / Nigeria West Belt",
BASEGEOGCRS["Minna",
DATUM["Minna",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4263]],
CONVERSION["Nigeria West Belt",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",4,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",4.5,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.99975,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",230738.26,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Nigeria - onshore west of 6°30'E, onshore and offshore shelf."],
BBOX[3.57,2.69,13.9,6.5]],
ID["EPSG",26391]]


Remarks :
Among these 3 key categories of “status_clean”, “unknown” has the most outliers.
pctFunctional <- ggplot(data = wp_ngaTrans,
aes(x = `pct_functional`))+
geom_histogram(bins=10,
colour = "black",
fill = "#543005")
pctNonFunctional <- ggplot(data = wp_ngaTrans,
aes(x = `pct_nonFunctional`))+
geom_histogram(bins=10,
colour = "black",
fill = "#C16622FF")
pctUnknown <- ggplot(data = wp_ngaTrans,
aes(x = `pct_unknown`))+
geom_histogram(bins = 10,
colour = "black",
fill = "#FFA319FF") shapeName pct_functional pct_nonFunctional pct_unknown pct_handPump
1 Aba North 41.17647 52.94118 5.882353 11.764706
2 Aba South 40.84507 46.47887 9.859155 9.859155
3 Abadam 0.00000 0.00000 0.000000 0.000000
4 Abaji 40.35088 59.64912 0.000000 40.350877
5 Abak 47.91667 50.00000 0.000000 8.333333
pct_mechPump pct_tapStand pct_uc300 pct_uc1000 pct_ucN1000 pct_uc250
1 82.35294 0 17.647059 82.35294 17.647059 0
2 87.32394 0 12.676056 87.32394 12.676056 0
3 0.00000 0 0.000000 0.00000 0.000000 0
4 59.64912 0 40.350877 59.64912 40.350877 0
5 91.66667 0 8.333333 91.66667 8.333333 0
pct_urban0
1 0.000000
2 5.633803
3 0.000000
4 84.210526
5 83.333333
Usage of the code chunk below :
corrplot.mixed( ) - corrplot - to use mixed methods to visualise a correlation matrix.
This plot allows to identify the pattern and the relationship in the matrix.

Remarks :
Following are the pairs with strong correlation :
| correlation coefficients | variable_1 | variable_2 |
|---|---|---|
| 1.00 | pct_mechPump | pct_uc1000 |
| 0.99 | pct_tapStand | pct_uc250 |
| 0.99 | pct_uc300 | pct_ucN1000 |
| -0.91 | pct_mechPump | pct_ucN1000 |
| -0.91 | pct_uc1000 | pct_ucN1000 |
| -0.90 | pct_mechPump | pct_uc300 |
| -0.90 | pct_uc300 | pct_uc1000 |
pct_functional pct_nonFunctional pct_unknown pct_handPump
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 20.77 1st Qu.: 0.00 1st Qu.: 16.70
Median : 47.41 Median : 34.89 Median : 0.00 Median : 50.99
Mean : 49.84 Mean : 35.58 Mean : 12.55 Mean : 48.73
3rd Qu.: 66.99 3rd Qu.: 50.00 3rd Qu.: 20.83 3rd Qu.: 77.78
Max. :100.00 Max. :100.00 Max. :100.00 Max. :100.00
pct_tapStand pct_uc300 pct_uc1000 pct_uc250
Min. : 0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.: 38.67 1st Qu.: 12.20 1st Qu.: 0.0000
Median : 0.0000 Median : 65.91 Median : 31.27 Median : 0.0000
Mean : 0.5794 Mean : 60.17 Mean : 37.54 Mean : 0.6114
3rd Qu.: 0.0000 3rd Qu.: 87.02 3rd Qu.: 57.71 3rd Qu.: 0.0000
Max. :32.8947 Max. :100.00 Max. :100.00 Max. :32.8947
pct_urban0
Min. : 0.00
1st Qu.: 57.27
Median : 86.45
Mean : 72.71
3rd Qu.:100.00
Max. :100.00
There are four (4) main steps :
As shown in the 4.2.3.1, there are few variables with Max. different from others. Hence, standardisation will be required prior to further analysis.
pct_functional pct_nonFunctional pct_unknown pct_handPump
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.3261 1st Qu.:0.2077 1st Qu.:0.0000 1st Qu.:0.1670
Median :0.4741 Median :0.3489 Median :0.0000 Median :0.5099
Mean :0.4984 Mean :0.3558 Mean :0.1255 Mean :0.4873
3rd Qu.:0.6699 3rd Qu.:0.5000 3rd Qu.:0.2083 3rd Qu.:0.7778
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
pct_tapStand pct_uc300 pct_uc1000 pct_uc250
Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.3867 1st Qu.:0.1220 1st Qu.:0.00000
Median :0.00000 Median :0.6591 Median :0.3127 Median :0.00000
Mean :0.01761 Mean :0.6017 Mean :0.3754 Mean :0.01859
3rd Qu.:0.00000 3rd Qu.:0.8702 3rd Qu.:0.5771 3rd Qu.:0.00000
Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000
pct_urban0
Min. :0.0000
1st Qu.:0.5727
Median :0.8645
Mean :0.7271
3rd Qu.:1.0000
Max. :1.0000
vars n mean sd median trimmed mad min max range skew
pct_functional 1 774 0 1 -0.10 -0.02 1.04 -2.06 2.07 4.13 0.14
pct_nonFunctional 2 774 0 1 -0.03 -0.02 1.05 -1.71 3.10 4.81 0.23
pct_unknown 3 774 0 1 -0.62 -0.22 0.00 -0.62 4.30 4.92 2.01
pct_handPump 4 774 0 1 0.07 0.00 1.37 -1.49 1.57 3.06 -0.09
pct_tapStand 5 774 0 1 -0.19 -0.19 0.00 -0.19 10.46 10.65 7.22
pct_uc300 6 774 0 1 0.19 0.08 1.10 -2.02 1.34 3.35 -0.56
pct_uc1000 7 774 0 1 -0.21 -0.09 1.05 -1.28 2.14 3.42 0.61
pct_uc250 8 774 0 1 -0.20 -0.20 0.00 -0.20 10.37 10.57 7.10
pct_urban0 9 774 0 1 0.42 0.17 0.62 -2.23 0.84 3.06 -1.12
kurtosis se
pct_functional -0.62 0.04
pct_nonFunctional -0.42 0.04
pct_unknown 4.15 0.04
pct_handPump -1.33 0.04
pct_tapStand 58.65 0.04
pct_uc300 -0.87 0.04
pct_uc1000 -0.78 0.04
pct_uc250 57.14 0.04
pct_urban0 -0.09 0.04
fwp <- ggplot(data=cluster_varsTrim,
aes(x= `pct_functional`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Before Standardisation")
fwp_stdDf <- as.data.frame(nga_wpStd)
fwp_std <- ggplot(data=fwp_stdDf,
aes(x=`pct_functional`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Min-Max Stdsn.")
fwp_zDf <- as.data.frame(nga_wpZ)
fwp_z <- ggplot(data=fwp_zDf,
aes(x=`pct_functional`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Z-score Stdsn.")
ggarrange(fwp, fwp_std, fwp_z,
ncol = 3,
nrow = 1)
fwp <- ggplot(data=cluster_varsTrim,
aes(x= `pct_functional`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Before Standardisation")
fwp_stdDf <- as.data.frame(nga_wpStd)
fwp_std <- ggplot(data=fwp_stdDf,
aes(x=`pct_functional`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Min-Max Stdsn.")
fwp_zDf <- as.data.frame(nga_wpZ)
fwp_z <- ggplot(data=fwp_zDf,
aes(x=`pct_functional`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Z-score Stdsn.")
ggarrange(fwp, fwp_std, fwp_z,
ncol = 3,
nrow = 1)
HP <- ggplot(data=cluster_varsTrim,
aes(x= `pct_handPump`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Before Standardisation")
fwp_stdDf <- as.data.frame(nga_wpStd)
HP_std <- ggplot(data=fwp_stdDf,
aes(x=`pct_handPump`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Min-Max Stdsn.")
fwp_zDf <- as.data.frame(nga_wpZ)
HP_z <- ggplot(data=fwp_zDf,
aes(x=`pct_handPump`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Z-score Stdsn.")
ggarrange(HP, HP_std, HP_z,
ncol = 3,
nrow = 1)
HP <- ggplot(data=cluster_varsTrim,
aes(x= `pct_handPump`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Before Standardisation")
fwp_stdDf <- as.data.frame(nga_wpStd)
HP_std <- ggplot(data=fwp_stdDf,
aes(x=`pct_handPump`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Min-Max Stdsn.")
fwp_zDf <- as.data.frame(nga_wpZ)
HP_z <- ggplot(data=fwp_zDf,
aes(x=`pct_handPump`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Z-score Stdsn.")
ggarrange(HP, HP_std, HP_z,
ncol = 3,
nrow = 1)
uc1000 <- ggplot(data=cluster_varsTrim,
aes(x= `pct_uc1000`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Before Standardisation")
fwp_stdDf <- as.data.frame(nga_wpStd)
uc1000_std <- ggplot(data=fwp_stdDf,
aes(x=`pct_uc1000`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Min-Max Stdsn.")
fwp_zDf <- as.data.frame(nga_wpZ)
uc1000_z <- ggplot(data=fwp_zDf,
aes(x=`pct_uc1000`)) +
geom_histogram(bins=20,
color="black",
fill="steelblue") +
ggtitle("Z-score Stdsn.")
ggarrange(uc1000, uc1000_std, uc1000_z,
ncol = 3,
nrow = 1)
uc1000 <- ggplot(data=cluster_varsTrim,
aes(x= `pct_uc1000`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Before Standardisation")
fwp_stdDf <- as.data.frame(nga_wpStd)
uc1000_std <- ggplot(data=fwp_stdDf,
aes(x=`pct_uc1000`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Min-Max Stdsn.")
fwp_zDf <- as.data.frame(nga_wpZ)
uc1000_z <- ggplot(data=fwp_zDf,
aes(x=`pct_uc1000`)) +
geom_density(color="black",
fill="steelblue") +
ggtitle("Z-score Stdsn.")
ggarrange(uc1000, uc1000_std, uc1000_z,
ncol = 3,
nrow = 1)
Usage of the code chunk below :
hclust( ) - stats - to compute cluster with agglomeration method.
ggdendrogram( ) - ggdendro - to plot dendrogram with tools available in ggplot2.
Usage of the code chunk below :
agnes( ) - cluster - to get agglomerative coefficient of 4 clustering structure, namely “average”, “single”, “complete” and “ward”.
average single complete ward
0.9264460 0.8825086 0.9494033 0.9923235
Remarks :
Value 1 indicate strongest clustering structure.
Ward’s method provides the strongest clustering structure. Therefore, Ward’s method to be used in subsequent analysis.
To determine the optimal clusters to retain, following commons methods are tested :
Gap statistic
Elbow
Average Silhouette
Usage of the code chunk below :
clusGap( ) - cluster - to compute the gap statistic.
Clustering Gap statistic ["clusGap"] from call:
clusGap(x = cluster_varsTrim, FUNcluster = hcut, K.max = 30, B = 50, nstart = 25)
B=50 simulated reference sets, k = 1..30; spaceH0="scaledPCA"
--> Number of clusters (method 'firstmax'): 30
logW E.logW gap SE.sim
[1,] 9.815280 10.315513 0.5002330 0.008043258
[2,] 9.602465 10.203144 0.6006791 0.008510149
[3,] 9.510126 10.144007 0.6338806 0.010041659
[4,] 9.443727 10.094054 0.6503272 0.009408325
[5,] 9.337773 10.055036 0.7172630 0.008377048
[6,] 9.286151 10.021049 0.7348986 0.008061617
[7,] 9.226030 9.992116 0.7660864 0.007509762
[8,] 9.181055 9.967079 0.7860239 0.007406854
[9,] 9.132662 9.944999 0.8123364 0.007505962
[10,] 9.088576 9.924941 0.8363644 0.008003248
[11,] 9.057435 9.906340 0.8489044 0.008086779
[12,] 9.019733 9.888898 0.8691655 0.008378669
[13,] 8.988798 9.872948 0.8841499 0.008499708
[14,] 8.962331 9.857905 0.8955736 0.008609360
[15,] 8.932159 9.843557 0.9113980 0.008574353
[16,] 8.908477 9.830133 0.9216564 0.008442998
[17,] 8.880801 9.817233 0.9364313 0.008235439
[18,] 8.846775 9.805149 0.9583744 0.008068772
[19,] 8.828254 9.793671 0.9654167 0.007975742
[20,] 8.812263 9.782502 0.9702393 0.007904523
[21,] 8.793736 9.771938 0.9782012 0.007903082
[22,] 8.777957 9.761659 0.9837022 0.007927244
[23,] 8.762944 9.751686 0.9887413 0.007838986
[24,] 8.745719 9.741903 0.9961837 0.007850884
[25,] 8.732706 9.732508 0.9998020 0.007792150
[26,] 8.716858 9.723358 1.0064996 0.007813310
[27,] 8.703095 9.714414 1.0113191 0.007684052
[28,] 8.684688 9.705770 1.0210819 0.007586041
[29,] 8.664250 9.697408 1.0331579 0.007592467
[30,] 8.649964 9.689233 1.0392698 0.007607051
Usage of the code chunk below :
fviz_nbclust( ) - factoextra - to compute and visualise the Optimal Number of clusters.
Warning: did not converge in 10 iterations
Warning: did not converge in 10 iterations
Warning: did not converge in 10 iterations


Remarks :
Given the Elbow method, Silhouette method and Gap Statistic method, the 5-cluster by Silhouette method will be used for the rest of the study.
The data is loaded into a data frame, but it has to be a data matrix to plot the heatmap. Hence, the data frame will need to first transform into a matrix.
Usage of the code chunk below :
heatmaply( ) - heatmaply - to build an interactive cluster heatmap.
heatmaply(normalize(nga_clustMat),
Colv=NA,
dist_method = "euclidean",
hclust_method = "ward.D",
seriate = "OLO",
colors = Blues,
k_row = 5,
margins = c(NA,200,60,NA),
fontsize_row = 4,
fontsize_col = 5,
main="Geographic Segmentation of Nigeria by Water Points",
xlab = "Water Points",
ylab = "Nigeria LGA"
)Remarks :
Based on the plot above, 5 clusters to be retained for further analysis.
SKATER function only support sp objects in SpatialPolygonDataFrame. Hence, the wp_ngaTrans has to first transform into SpatialPolygonDataFrame before proceed further.
Usage of the code chunk below :
poly2nb( ) - spdep - to compute the neighbours list from polygon list.
Neighbour list object:
Number of regions: 774
Number of nonzero links: 4440
Percentage nonzero weights: 0.7411414
Average number of links: 5.736434
1 region with no links:
86
Link number distribution:
0 1 2 3 4 5 6 7 8 9 10 11 12 14
1 2 14 57 125 182 140 122 72 41 12 4 1 1
2 least connected regions:
138 560 with 1 link
1 most connected region:
508 with 14 links
Remarks :
There is one (1) region, i.e. #86 is without link. It has to be removed first before proceed to plot the neighbours list.
Neighbour list object:
Number of regions: 773
Number of nonzero links: 4440
Percentage nonzero weights: 0.7430602
Average number of links: 5.743855
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 14
2 14 57 125 182 140 122 72 41 12 4 1 1
2 least connected regions:
138 560 with 1 link
1 most connected region:
508 with 14 links
nb2listw( ) - spdep - to specify edge_cost as the spatial weights. Set the “style” to “B” to ensure the cost values are not row-standardised.
Warning in nb2listw(nga.nb1, edge_cost, style = "B"): zero sum general weights
Characteristics of weights list object:
Neighbour list object:
Number of regions: 773
Number of nonzero links: 4440
Percentage nonzero weights: 0.7430602
Average number of links: 5.743855
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 14
2 14 57 125 182 140 122 72 41 12 4 1 1
2 least connected regions:
138 560 with 1 link
1 most connected region:
508 with 14 links
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 773 597529 245120.7 38463020 406298492
Usage of the code chunk below :
skater( ) - spdep - to compute the spatially constrained cluster.
List of 8
$ groups : num [1:773] 3 3 1 5 4 1 2 2 1 3 ...
$ edges.groups:List of 5
..$ :List of 3
.. ..$ node: num [1:309] 773 747 492 131 382 224 413 488 439 257 ...
.. ..$ edge: num [1:308, 1:3] 131 382 224 413 257 767 439 704 476 75 ...
.. ..$ ssw : num 17013
..$ :List of 3
.. ..$ node: num [1:129] 597 315 316 557 195 571 339 744 205 213 ...
.. ..$ edge: num [1:128, 1:3] 315 316 557 195 571 15 82 579 744 351 ...
.. ..$ ssw : num 7874
..$ :List of 3
.. ..$ node: num [1:85] 364 10 729 215 337 551 102 103 66 19 ...
.. ..$ edge: num [1:84, 1:3] 23 536 578 103 19 375 727 617 188 103 ...
.. ..$ ssw : num 4545
..$ :List of 3
.. ..$ node: num [1:39] 550 202 330 287 374 732 537 586 733 201 ...
.. ..$ edge: num [1:38, 1:3] 612 586 136 245 332 429 504 537 586 616 ...
.. ..$ ssw : num 1294
..$ :List of 3
.. ..$ node: num [1:211] 67 510 401 122 24 526 475 489 663 303 ...
.. ..$ edge: num [1:210, 1:3] 67 549 510 119 639 401 556 122 693 70 ...
.. ..$ ssw : num 10380
$ not.prune : NULL
$ candidates : int [1:5] 1 2 3 4 5
$ ssto : num 52660
$ ssw : num [1:5] 52660 48724 44268 42679 41106
$ crit : num [1:2] 1 Inf
$ vec.crit : num [1:773] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "skater"
Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
coords[id2, : "add" is not a graphical parameter
Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
coords[id2, : "add" is not a graphical parameter
Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
coords[id2, : "add" is not a graphical parameter
Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
coords[id2, : "add" is not a graphical parameter
Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
coords[id2, : "add" is not a graphical parameter

To compare the output of hierarchical clustering and spatially constrained hierarchical clustering :
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

Usage of the code chunk below :
hclustgeo( ) - ClustGeo - to perform a typical Ward-like hierarchical clustering.
Usage of the code chunk below :
st_distance( ) - sf - to derive the spatial distance matrix before perform spatially constrained hierarchical clustering.
as.dist( ) - stats - to convert the data frame into matrix.
choicealpha( ) - psych - to determine a suitable value for the mixing parameter alpha.


Remarks :
With reference to the plot above, alpha = 0.4 to be used to perform spatially constrained hierarchical clustering.
Usage of the code chunk below :
ggparcoord( ) - GGally - to reveal clustering variables by cluster.
Simple feature collection with 3 features and 11 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 7.307433 ymin: 5.052192 xmax: 13.83477 ymax: 13.71406
Projected CRS: Minna / Nigeria West Belt
shapeName pct_functional pct_nonFunctional pct_unknown pct_handPump
1 Aba North 41.17647 52.94118 5.882353 11.764706
2 Aba South 40.84507 46.47887 9.859155 9.859155
3 Abadam 0.00000 0.00000 0.000000 0.000000
pct_tapStand pct_uc300 pct_uc1000 pct_uc250 pct_urban0 cluster
1 0 17.64706 82.35294 0 0.000000 1
2 0 12.67606 87.32394 0 5.633803 1
3 0 0.00000 0.00000 0 0.000000 1
geometry
1 MULTIPOLYGON (((7.401109 5....
2 MULTIPOLYGON (((7.334479 5....
3 MULTIPOLYGON (((13.83477 13...

The parallel coordinate plot above reveals that households in Cluster 4 townships tend to own the highest number of TV and mobile-phone. On the other hand, households in Cluster 5 tends to own the lowest of all the five ICT.
Note that the scale argument of ggparcoor() provide several methods to scale the clustering variables. They are:
std: univariately, subtract mean and divide by standard deviation.
robust: univariately, subtract median and divide by median absolute deviation.
uniminmax: univariately, scale so the minimum of the variable is zero, and the maximum is one.
globalminmax: no scaling is done; the range of the graphs is defined by the global minimum and the global maximum.
center: use uniminmax to standardize vertical height, then center each variable at a value specified by the scaleSummary param.
centerObs: use uniminmax to standardize vertical height, then center each variable at the value of the observation specified by the centerObsID param
nga_ngeo_clust.sf %>%
st_set_geometry(NULL) %>%
group_by(cluster) %>%
summarise(mean_pct_functional = mean(pct_functional),
mean_pct_nonFunctional = mean(pct_nonFunctional),
mean_pct_unknown = mean(pct_unknown),
mean_pct_handPump = mean(pct_handPump),
mean_pct_tapStand = mean(pct_tapStand),
mean_pct_uc300 = mean(pct_uc300),
mean_pct_uc1000 = mean(pct_uc1000),
mean_pct_uc250 = mean(pct_uc250),
mean_pct_urban0 = mean(pct_urban0))# A tibble: 5 × 10
cluster mean_pct_fun…¹ mean_…² mean_…³ mean_…⁴ mean_…⁵ mean_…⁶ mean_…⁷ mean_…⁸
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 51.7 29.0 9.25 35.0 0.525 44.1 45.5 0.539
2 2 46.2 42.2 11.3 59.9 1.00 70.6 28.3 1.03
3 3 40.5 49.4 9.15 9.87 0.304 19.2 80.4 0.345
4 4 18.4 14.6 67.0 18.1 0.337 72.1 27.5 0.410
5 5 78.9 20.7 0.295 88.7 0.0677 89.2 10.7 0.0903
# … with 1 more variable: mean_pct_urban0 <dbl>, and abbreviated variable names
# ¹mean_pct_functional, ²mean_pct_nonFunctional, ³mean_pct_unknown,
# ⁴mean_pct_handPump, ⁵mean_pct_tapStand, ⁶mean_pct_uc300, ⁷mean_pct_uc1000,
# ⁸mean_pct_uc250